Model Selection

Multi-objective alignment

# Multi-objective alignment

Gpt2 Large Helpful Reward Model

A GPT2 large model trained on the Anthropic/hh-rlhf helpfulness dataset, specifically designed for helpful response detection or RLHF (Reinforcement Learning from Human Feedback).

Large Language Model

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase